AI-Powered Recipe Generator from Food Images Using Deep Learning

Authors: Deepak Ramakant Yannadle, Chirayu Rakesh Vartak, Disha Chandrakant Manjarekar, Sharadha Shah

DOI Link: https://doi.org/10.22214/ijraset.2025.67339

Abstract

Theautomaticgenerationofcookingrecipesfromfoodimageshasgainedsignificantattentioninthe fieldoffoodcomputingandartificialintelligence.Thisresearchpresentsadeeplearning-basedapproach for recipe generation from food images, achieving an accuracy of 92.7%.The proposed model utilizes computer vision techniques to analyze food images and predict essential recipe components, including therecipetitle,ingredients,andstep-by-stepcookinginstructions.AcombinationofConvolutionalNeu- ral Networks (CNNs) and Transformer-based architectures enhances the system’s ability to understand complex food compositions.The dataset used comprises diverse food categories, ensuring robust gen- eralization across various cuisines.Performance evaluation against benchmark datasets highlights the model’ssuperiorityingeneratingcoherentandcontextuallyaccuraterecipes.Comparisonswithstate-of- the-artmodels,includingInverseCookingandFIRE,demonstrateimprovementsiningredientprediction and instruction coherence.Despite achieving high accuracy, challenges such as ingredient ambiguity and complex dish representations persist.Future work aims to refine multimodal learning approaches andintegratereal-timefoodrecognitionforenhanceduserexperience. Thisstudycontributestoadvanc- ing AI-driven food recommendation systems, bridging the gap between computer vision and culinary knowledge.

Introduction

Background

Food is central to culture and daily life, and automated recipe generation from food images is an emerging field in food computing, AI, and computer vision.
Previous models like Inverse Cooking and FIRE demonstrated the potential of multimodal learning (images + text), but faced issues with ingredient ambiguity, complex dishes, and instruction coherence.
This study proposes an advanced deep learning model achieving 92.7% accuracy in generating recipes from images.

2. Objectives

Develop a deep learning system that predicts recipe titles, ingredients, and instructions from food images.
Evaluate performance using accuracy and BLEU scores.
Compare results with state-of-the-art models (Inverse Cooking, FIRE).
Address challenges like missing ingredients and multi-component dish complexity.
Contribute to food computing via multimodal AI integration.

3. Significance

This research impacts multiple domains:

AI & Food Computing: Improves how AI understands food content, aiding recommendation systems.
Health & Nutrition: Helps in dietary tracking, calorie estimation, and personalized nutrition.
Human-Computer Interaction: Enables smart kitchen assistants and cooking guides using voice and vision.
Food Industry Support: Assists chefs, bloggers, and restaurants in automated recipe creation and content generation.
Multimodal AI & NLP: Advances the fusion of visual and textual data for better food recognition and recommendation.

4. Literature Survey

A. Food Image to Recipe Generation

Inverse Cooking: Used a two-stage neural model (ingredient prediction + instruction generation). Faced issues with ambiguity and dish complexity.
FIRE: Combined CNNs and transformers for better multimodal learning but still struggled with overlapping ingredients.

B. Deep Learning for Food Recognition

CNN-based models by Kagaya et al. and Bolanos et al. improved food classification and calorie estimation.

C. Multimodal AI

Cross-modal retrieval and contrastive learning enhanced image-text alignment and recipe recommendation.

D. Challenges

Ingredient ambiguity
Complex dish representations
Instruction coherence

5. Methodology

A. Data Pre-processing

Resize images to 224×224
Normalize pixel values
Tokenize ingredients
Remove meaningless stopwords

B. Data Augmentation

Rotation, flipping, brightness adjustment, Gaussian noise to increase model robustness

C. Model Architecture

Image Encoder: Fine-tuned ResNet-50, outputs a 512-dim feature vector
Recipe Generator: Transformer-based sequence model for generating structured recipes (titles, ingredients, steps)

D. Training

Loss functions: cross-entropy (ingredients), sequence loss (instructions)
Batch size: 32, Epochs: 50, Optimizer: Adam, Learning rate: 0.0001

6. Results and Discussion

A. Performance

Achieved 92.7% validation accuracy and nearly 95% training accuracy
BLEU scores used to measure instruction generation quality

B. Key Observations

High accuracy indicates successful use of CNN + transformer architecture.
Minimal overfitting, as seen from close training and validation metrics.
Scalability: Modular design supports larger datasets and potential multi-language support.

Conclusion

In conclusion, addressing current challenges and incorporating proposed advancements in food image-to- recipe systems can significantly enhance their utility and accessibility.By improving the accuracy and robustnessofimagerecognitionmodels,thesesystemscanhandlevariousfoodtypes—includingcomplex, poorly lit, or unconventional images—and expanding the recipe database to include diverse cultural, re- gional, and dietary variations ensures inclusivity for a wide range of preferences and restrictions.The integration of personalized recipe suggestions based on users’ health data, nutritional needs, and availableingredientsprovidesatailoredexperiencethatpromoteshealthierchoices.Furthermore, incorpo- rating voice and multimodal inputs, along with compatibility with smart kitchen devices, offers seamless, hands-free assistance.As these technologies evolve with real-time feedback, adaptive learning, and user- centric features, they will transform food preparation and meal planning—empowering users to cook with ease, creativity, and confidence.

References

[1] Ma, J., Mawji, B.,Williams, F. (2024). \\\"Deep Image-to-Recipe Translation.\\\" arXiv preprint arXiv:2407.00911. [2] Chhikara,P.,Jain,A.,Aytar,Y.,etal.(2024).\\\"FIRE:FoodImagetoRecipeGeneration.\\\"Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). [3] Wang, Y., Chen, J.,Li, X. (2024). \\\"Retrieval Augmented Recipe Generation.\\\" arXiv preprint arXiv:2411.08715. [4] Deep Plate: A Deep Learning Approach to Recipe Generation from Food Images. (2024). Journal of Open Source Software and Data Technologies. [5] Image to Recipe and Nutritional Value Generator Using Deep Learning. (2024). Proceedings of the International Conference on Artificial Intelligence and Machine Learning. [6] AIWantstoCountYourCalories.(2024).TheWallStreetJournal. [7] Marin,J.,Jain,A.,Aytar,Y.,etal.(2023).\\\"FIRE:FoodImagetoRecipeGenerationUsingMultimodal Learning.\\\" arXiv preprint arXiv:2308.14391. [8] Zhu, B., Ngo, C.-W., Chen, J.,Chan, W.-K. (2023). \\\"Cross-domain Food Image-to-Recipe Retrieval by Weighted Adversarial Learning.\\\" arXiv preprint arXiv:2304.07387. [9] Enesi, I. (2023). \\\"An End-to-End Deep Learning System for Recommending Healthy Recipes Based on Food Images.\\\" International Journal of Advanced Computer Science and Applications. [10] RecipeGenerationfromFoodImagesUsingDeepLearning.(2023).InternationalResearchJournalof Engineering and Technology (IRJET). [11] RecipeGenerationfromFoodImageswithDeepLearning.(2023).Abhivruddhi: TheJournalofEngi- neering and Technology. [12] Chen, J., Sun, M., Fang, S., et al. (2023). \\\"Cross-Modal Food Retrieval:Linking Food Images and Recipes Using Transformer Networks.\\\" IEEE Transactions on Multimedia. [13] Wang, T., Liu, J.,Yang, H. (2023). \\\"Contrastive Learning for Image-to-Recipe Retrieval.\\\" Neural Information Processing Systems (NeurIPS). [14] Wang, T., Liu, J.,Yang, H. (2022). \\\"Contrastive Learning for Image-to-Recipe Retrieval.\\\" Neural Information Processing Systems (NeurIPS). [15] Chen, J., Sun, M., Fang, S., et al. (2021). \\\"Cross-Modal Food Retrieval:Linking Food Images and Recipes Using Transformer Networks.\\\" IEEE Transactions on Multimedia. [16] Salvador, A., Drozdzal, M., Giro-i-Nieto, X.,Moreno-Noguer, F. (2019). \\\"Inverse Cooking: Recipe Generation from Food Images.\\\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). [17] Bolanos,M.,Radeva,P.,Garcia,V.(2017).\\\"FoodRecognitionUsingDeepLearningandHierarchical Classifiers.\\\" Pattern Recognition Letters. [18] Kagaya, H., Aizawa, K.,Ogawa, M. (2014). \\\"Food Image Recognition Using Deep ConvolutionalNeural Network.\\\" ACM Multimedia Conference.

Copyright

Copyright © 2025 Deepak Ramakant Yannadle, Chirayu Rakesh Vartak, Disha Chandrakant Manjarekar, Sharadha Shah. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET67339

Publish Date : 2025-03-08

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here